13 research outputs found

    Coloring Sums of Extensions of Certain Graphs

    Get PDF
    Recall that the minimum number of colors that allow a proper coloring of graph GG is called the chromatic number of GG and denoted by χ(G).\chi(G). In this paper the concepts of χ\chi'-chromatic sum and χ+\chi^+-chromatic sum are introduced. The extended graph GxG^x of a graph GG was recently introduced for certain regular graphs. We further the concepts of χ\chi'-chromatic sum and χ+\chi^+-chromatic sum to extended paths and cycles. The paper concludes with \emph{patterned structured} graphs.Comment: 12 page

    Coloring sums of extensions of certain graphs

    Get PDF
    We recall that the minimum number of colors that allow a proper coloring of graph GG is called the chromatic number of GG and denoted χ(G)\chi(G). Motivated by the introduction of the concept of the bb-chromatic sum of a graph the concept of χ′\chi'-chromatic sum and χ+\chi^+-chromatic sum are introduced in this paper. The extended graph GxG^x of a graph GG was recently introduced for certain regular graphs. This paper furthers the concepts of χ′\chi'-chromatic sum and χ+\chi^+-chromatic sum to extended paths and cycles. Bipartite graphs also receive some attention. The paper concludes with patterned structured graphs. These last said graphs are typically found in chemical and biological structures

    Improved imbalanced classification through convex space learning

    Get PDF
    Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling

    Self-Attention-Based Models for the Extraction of Molecular Interactions from Biological Texts

    No full text
    For any molecule, network, or process of interest, keeping up with new publications on these is becoming increasingly difficult. For many cellular processes, the amount molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large-scale molecular interaction maps and database curation. Text mining and Natural-Language-Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and Machine-Learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention-based models, a special type of Neural-Network (NN)-based architecture that has recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at the sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conducted a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text

    Accounting for diverse feature-types improves patient stratification on tabular clinical datasets

    No full text
    Tabular Clinical and Biomedical Routine Data (CBRD) contains diverse feature types. Recent research shows that the conventional application of Uniform Manifold Projection and Approximation (UMAP) to extract clusters from the low dimensional embedding can prove ineffective due to the diverse feature types in such datasets. Feature-type Distributed Clustering (FDC) workflow accounts for these diverse feature types resulting in a more informative low-dimensional embedding. However, a rigorous assessment of the FDC algorithm is missing so far. In this work, we conducted comprehensive benchmarking experiments to compare the quality of the cluster distributions and low dimensional embeddings generated by the FDC against that of the ones generated by UMAP using standard objective measures: Silhouette score, Dunn index, and ANOVA. Our results confirm that FDC can indeed be the better choice to embed tabular data with diverse feature types in low dimensions and thereby extract clusters from such an embedding. In addition, we provide a rationale behind the choice of metrics proposed in the FDC workflow. Moreover, we also point out some problems with the original Canberra metric used to reduce ordinal features in the FDC workflow and provide a solution in the form of a modified version of the Canberra metric. Using seven datasets from the medical domain for benchmarking, we demonstrate that FDC leads to improved patient stratification

    Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care

    No full text
    AI model development for synthetic data generation to improve Machine Learning (ML) methodologies is an integral part of research in Computer Science and is currently being transferred to related medical fields, such as Systems Medicine and Medical Informatics. In general, the idea of personalized decision-making support based on patient data has driven the motivation of researchers in the medical domain for more than a decade, but the overall sparsity and scarcity of data are still major limitations. This is in contrast to currently applied technology that allows us to generate and analyze patient data in diverse forms, such as tabular data on health records, medical images, genomics data, or even audio and video. One solution arising to overcome these data limitations in relation to medical records is the synthetic generation of tabular data based on real world data. Consequently, ML-assisted decision-support can be interpreted more conveniently, using more relevant patient data at hand. At a methodological level, several state-of-the-art ML algorithms generate and derive decisions from such data. However, there remain key issues that hinder a broad practical implementation in real-life clinical settings. In this review, we will give for the first time insights towards current perspectives and potential impacts of using synthetic data generation in palliative care screening because it is a challenging prime example of highly individualized, sparsely available patient information. Taken together, the reader will obtain initial starting points and suitable solutions relevant for generating and using synthetic data for ML-based screenings in palliative care and beyond
    corecore